Efficient Record Linkage Algorithms Using Complete Linkage Clustering
نویسندگان
چکیده
منابع مشابه
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records....
متن کاملSummarization Algorithms for Record Linkage
Record linkage has received significant attention in recent years due to the plethora of data sources that have to be integrated to facilitate data analyses. In several cases, such an integration involves disparate data sources containing huge volumes of records and must be performed in near real-time in order to support critical applications. In this paper, we propose the first summarization a...
متن کاملPoisoning Complete-Linkage Hierarchical Clustering
Clustering algorithms are largely adopted in security applications as a vehicle to detect malicious activities, although few attention has been paid on preventing deliberate attacks from subverting the clustering process itself. Recent work has introduced a methodology for the security analysis of data clustering in adversarial settings, aimed to identify potential attacks against clustering al...
متن کاملEfficient sequential and parallel algorithms for record linkage
BACKGROUND AND OBJECTIVE Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any n...
متن کاملComplete similarity blocking for record linkage using bit vectors
Information about individual people (or entities in general) can be distributed among different data sets, or among different records in a single data set. In many applications, there is no unique identifying field (such as a primary key) that allows to link all pieces of information about the subjects (a task known as record linkage). Other fields can be used to link records, but these fields,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: PLOS ONE
سال: 2016
ISSN: 1932-6203
DOI: 10.1371/journal.pone.0154446